STOCK PRICE PREDICTION


1. | INTRODUCTION 🔎¶

Stock Price Prediction is a machine learning project that aims to predict the future value of a stock based on its past performance and other market conditions. The goal is to analyze financial data, including historical stock prices, news articles, economic indicators, and other relevant information to build a model that can accurately predict the direction of stock prices. The resulting predictions can be used by investors to make informed decisions about buying or selling stocks.

2.| IMPORT NECESSARY LIBRARIES ¶

In [1]:
import pandas as pd
import numpy as np
In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
import chart_studio.plotly as py
In [4]:
import plotly.graph_objs as go
from plotly.offline import plot
In [5]:
from matplotlib.pylab import rcParams
rcParams['figure.figsize']=20,10
from keras.models import Sequential
from keras.layers import LSTM,Dropout,Dense

3.| Imported the given data set as a CSV ¶

In [6]:
SP=pd.read_csv("SP data.csv")
SP
Out[6]:
Date Open High Low Close Adj Close Volume
0 2018-02-05 262.000000 267.899994 250.029999 254.259995 254.259995 11896100
1 2018-02-06 247.699997 266.700012 245.000000 265.720001 265.720001 12595800
2 2018-02-07 266.579987 272.450012 264.329987 264.559998 264.559998 8981500
3 2018-02-08 267.079987 267.619995 250.000000 250.100006 250.100006 9306700
4 2018-02-09 253.850006 255.800003 236.110001 249.470001 249.470001 16906900
... ... ... ... ... ... ... ...
1004 2022-01-31 401.970001 427.700012 398.200012 427.140015 427.140015 20047500
1005 2022-02-01 432.959991 458.480011 425.540009 457.130005 457.130005 22542300
1006 2022-02-02 448.250000 451.980011 426.480011 429.480011 429.480011 14346000
1007 2022-02-03 421.440002 429.260010 404.279999 405.600006 405.600006 9905200
1008 2022-02-04 407.309998 412.769989 396.640015 410.170013 410.170013 7782400

1009 rows × 7 columns

4.| Data Head Displayed ¶

In [7]:
SP.head(10)
Out[7]:
Date Open High Low Close Adj Close Volume
0 2018-02-05 262.000000 267.899994 250.029999 254.259995 254.259995 11896100
1 2018-02-06 247.699997 266.700012 245.000000 265.720001 265.720001 12595800
2 2018-02-07 266.579987 272.450012 264.329987 264.559998 264.559998 8981500
3 2018-02-08 267.079987 267.619995 250.000000 250.100006 250.100006 9306700
4 2018-02-09 253.850006 255.800003 236.110001 249.470001 249.470001 16906900
5 2018-02-12 252.139999 259.149994 249.000000 257.950012 257.950012 8534900
6 2018-02-13 257.290009 261.410004 254.699997 258.269989 258.269989 6855200
7 2018-02-14 260.470001 269.880005 260.329987 266.000000 266.000000 10972000
8 2018-02-15 270.029999 280.500000 267.630005 280.269989 280.269989 10759700
9 2018-02-16 278.730011 281.959991 275.690002 278.519989 278.519989 8312400
In [8]:
print(SP[SP.isnull().any(axis=1)])
Empty DataFrame
Columns: [Date, Open, High, Low, Close, Adj Close, Volume]
Index: []
In [9]:
import pandas as pd

# Assuming the dataset is stored in a pandas DataFrame called 'df'
desired_value = 275.690002

# Filter the DataFrame based on the desired value in the 'Low' column
filtered_data = SP[SP['Low'] == desired_value]

# Print the filtered data
print(filtered_data)

# second type of formate to find the specific value and row
value = SP[SP['Volume'] == 10972000]
print(value)
         Date        Open        High         Low       Close   Adj Close  \
9  2018-02-16  278.730011  281.959991  275.690002  278.519989  278.519989   

    Volume  
9  8312400  
         Date        Open        High         Low  Close  Adj Close    Volume
7  2018-02-14  260.470001  269.880005  260.329987  266.0      266.0  10972000

5. | Get basic information¶

In [10]:
SP.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1009 entries, 0 to 1008
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       1009 non-null   object 
 1   Open       1009 non-null   float64
 2   High       1009 non-null   float64
 3   Low        1009 non-null   float64
 4   Close      1009 non-null   float64
 5   Adj Close  1009 non-null   float64
 6   Volume     1009 non-null   int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 55.3+ KB

6. | Convert date in data(SP) to data end time ¶

In [11]:
SP['Date'] = pd.to_datetime(SP['Date'])
In [12]:
print(f'Dataframe contains SP between {SP.Date.min()} {SP.Date.max()}')
Dataframe contains SP between 2018-02-05 00:00:00 2022-02-04 00:00:00
In [13]:
print(f'Total dates = {(SP.Date.max()  - SP.Date.min()).days}')
Total dates = 1460

7. | Descriptive Statistics of Numeric Variables ¶

In [14]:
 SP.describe()
Out[14]:
Open High Low Close Adj Close Volume
count 1009.000000 1009.000000 1009.000000 1009.000000 1009.000000 1.009000e+03
mean 419.059673 425.320703 412.374044 419.000733 419.000733 7.570685e+06
std 108.537532 109.262960 107.555867 108.289999 108.289999 5.465535e+06
min 233.919998 250.649994 231.229996 233.880005 233.880005 1.144000e+06
25% 331.489990 336.299988 326.000000 331.619995 331.619995 4.091900e+06
50% 377.769989 383.010010 370.880005 378.670013 378.670013 5.934500e+06
75% 509.130005 515.630005 502.529999 509.079987 509.079987 9.322400e+06
max 692.349976 700.989990 686.090027 691.690002 691.690002 5.890430e+07

8. | Make a plot for each column ¶

In [15]:
SP[['Open','High','Low','Close','Adj Close']].plot(kind='box')
Out[15]:
<Axes: >

9. | Stock Price Plot ¶

Date wise closing price of the stock¶

In [16]:
layout = go.Layout(
    title = 'Stock price plot')
In [17]:
SP_data = [{'x':SP['Date'], 'y':SP['Close']}]
plot = go.Figure(data=SP_data)
In [18]:
plot

10. | Import Sklearn ¶

In [19]:
from sklearn.model_selection import train_test_split
In [20]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
In [21]:
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score
In [22]:
X = np.array(SP.index).reshape(-1,1)
Y = SP['Close']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=101)
In [23]:
scaler = StandardScaler().fit(X_train)
scaler
Out[23]:
StandardScaler()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
StandardScaler()

11. | Stock price forecasting with the help of linear regression model and fitting ¶

In [24]:
from sklearn.linear_model import LinearRegression
In [25]:
lm = LinearRegression()
lm.fit(X_train, Y_train)
Out[25]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
In [26]:
trace0 = go.Scatter(
      x = X_train.T[0],
      y = Y_train,
   mode = 'markers',
    name = 'Actual'
)
trace1 = go.Scatter(
     x = X_train.T[0],
    y = lm.predict(X_train).T,
    mode = 'lines',
    name = 'Predicted'
)





SP_data = [trace0,trace1]
layout.xaxis.title.text = 'Day'
In [27]:
plot2 = go.Figure(data=SP_data, layout=layout)
plot2

This code defines a string scores that displays metrics for a linear regression model (lm) on both the training set (X_train and Y_train) and the test set (X_test and Y_test). The metrics displayed are the R^2 score and the mean squared error (MSE). The string uses f-string syntax to embed the results of the r2_score and mse functions applied to the train and test sets. The ljust and center methods are used to align the text in the string. When the scores string is printed, it will display a table showing the metric names, their values on the training set, and their values on the test set.

In [28]:
scores = f'''
{'Metric'.ljust(10)}{'Train'.center(20)}{'Test'.center(20)}
{'r2_score'.ljust(10)}{r2_score(Y_train, lm.predict(X_train))}\t{r2_score(Y_test, lm.predict(X_test))}
{'MSE'.ljust(10)}{mse(Y_train, lm.predict(X_train))}\t{mse(Y_test, lm.predict(X_test))}
'''
print(scores)
Metric           Train                Test        
r2_score  0.6992669032944175	0.7261648669848495
MSE       3403.003880002517	3460.9885809580633

THANKYOU 😊¶

Submitted by Rohit Varathe


In [ ]: